167 research outputs found

    Utilisation de la langue naturelle pour l'interrogation de documents structurés

    Get PDF
    http://www.asso-aria.org/coria/2005/19.pdfInternational audienceLe langage de requĂȘte est l'indispensable interface entre l'utilisateur et l'outil de recherche. SimplifiĂ© au maximum dans les cas oĂč les moteurs indexent essentiellement des documents plats, il devient fort complexe lorsqu'il s'adresse Ă  des documents structurĂ©s et qu'il s'a git de dĂ©finir des contraintes portant Ă  la fois sur la structure et le contenu. L'approche ici- dĂ©crite propose d'utiliser la langue naturelle comme interface pour exprimer de telles requĂȘtes. L'article dĂ©crit dans un premier temps les diffĂ©rentes phases qui permettent de transformer (dans un cadre de recherche d'information) la requĂȘte en langage naturel en une reprĂ©sentation sĂ©mantique indĂ©pendante du contexte. Des rĂšgles de simplification adaptĂ©es Ă  la structure et au domaine du corpus sont ensuite appliquĂ©es, permettant d'obtenir une forme finale, adaptĂ©e Ă  une conversion ver s un langage de requĂȘte formel. L'article dĂ©crit enfin les expĂ©rimentations effectuĂ©es et tir e les premiĂšres conclusions sur divers aspects de cette approche

    Justification of Answers by Verification of Dependency Relations-The French AVE Task.

    Get PDF
    International audienceThis paper presents LIMSI results in Answer Validation Exercise (AVE) 2008 for French. We tested two approaches during this campaign: a syntax-based strategy and a machine learning strategy. Results of both approaches are presented and discussed

    Supervised Machine Learning Techniques to Detect TimeML Events in French and English

    Get PDF
    International audienceIdentifying events from texts is an information extraction task necessary for many NLP applications. Through the TimeML specifications and TempEval challenges, it has received some attention in the last years; yet, no reference result is available for French. In this paper, we try to fill this gap by proposing several event extraction systems, combining for instance Conditional Random Fields, language modeling and k-nearest-neighbors. These systems are evaluated on French corpora and compared with state-of-the-art methods on English. The very good results obtained on both languages validate our whole approach

    Question Generation for French: Collating Parsers and Paraphrasing Questions

    Get PDF
    This article describes a question generation system for French. The transformation of declarative sentences into questions relies on two different syntactic parsers and named entity recognition tools. This makes it possible to further diversify the questions generated and to possibly alleviate the problems inherent to the analysis tools. The system also generates reformulations for the questions based on variations in the question words, inducing answers with different granularities, and nominalisations of action verbs. We evaluate the questions generated for sentences extracted from two different corpora: a corpus of newspaper articles used for the CLEF Question Answering evaluation campaign and a corpus of simplified online encyclopedia articles. The evaluation shows that the system is able to generate a majority of good and medium quality questions. We also present an original evaluation of the question generation system using the question analysis module of a question answering system

    Évaluation de la contextualisation de tweets

    Get PDF
    National audienceCet article s'intéresse à l'évaluation de la contextualisation de tweets. La contextualisation est définie comme un résumé permettant de remettre en contexte un texte qui, de par sa taille, ne contient pas l'ensemble des éléments qui permettent à un lecteur de comprendre tout ou partie de son contenu. Nous définissons un cadre d'évaluation pour la contextualisation de tweets généralisable à d'autres textes courts. Nous proposons une collection de référence ainsi que des mesures d'évaluation adhoc. Ce cadre d'évaluation a été expérimenté avec succÚs dans la contexte de la campagne INEX Tweet Contextualization. Au regard des résultats obtenus lors de cette campagne, nous discutons ici les mesures utilisées en lien avec les autres mesures de la littérature

    Overview of INEX Tweet Contextualization 2013 track

    Get PDF
    International audienceTwitter is increasingly used for on-line client and audience fishing; this motivated the tweet contextualization task at INEX. The objective is to help a user to understand a tweet by providing him with a short summary (500 words). This summary should be built automatically using local resources like the Wikipedia and generated by extracting relevant passages and aggregating them into a coherent summary. The task is evaluated considering informativeness which is computed using a variant of Kullback-Leibler divergence and passage pooling. Meanwhile effective readability in context of summaries is checked using binary questionnaires on small samples of results. Running since 2010, results show that only systems that efficiently combine passage retrieval, sentence segmentation and scoring, named entity recognition, text POS analysis, anaphora detection, diversity content measure as well as sentence reordering are effective

    Overview of INEX Tweet Contextualization 2014 track

    Get PDF
    International audience140 characters long messages are rarely self-content. The Tweet Contextualization aims at providing automatically information - a summary that explains the tweet. This requires combining multiple types of processing from information retrieval to multi-document sum- marization including entity linking. Running since 2010, the task in 2014 was a slight variant of previous ones considering more complex queries from RepLab 2013. Given a tweet and a related entity, systems had to provide some context about the subject of the tweet from the perspective of the entity, in order to help the reader to understand it

    Impact of translation on biomedical information extraction from real-life clinical notes

    Full text link
    The objective of our study is to determine whether using English tools to extract and normalize French medical concepts on translations provides comparable performance to French models trained on a set of annotated French clinical notes. We compare two methods: a method involving French language models and a method involving English language models. For the native French method, the Named Entity Recognition (NER) and normalization steps are performed separately. For the translated English method, after the first translation step, we compare a two-step method and a terminology-oriented method that performs extraction and normalization at the same time. We used French, English and bilingual annotated datasets to evaluate all steps (NER, normalization and translation) of our algorithms. Concerning the results, the native French method performs better than the translated English one with a global f1 score of 0.51 [0.47;0.55] against 0.39 [0.34;0.44] and 0.38 [0.36;0.40] for the two English methods tested. In conclusion, despite the recent improvement of the translation models, there is a significant performance difference between the two approaches in favor of the native French method which is more efficient on French medical texts, even with few annotated documents.Comment: 26 pages, 2 figures, 5 table

    Good practices for clinical data warehouse implementation: a case study in France

    Full text link
    Real World Data (RWD) bears great promises to improve the quality of care. However, specific infrastructures and methodologies are required to derive robust knowledge and brings innovations to the patient. Drawing upon the national case study of the 32 French regional and university hospitals governance, we highlight key aspects of modern Clinical Data Warehouses (CDWs): governance, transparency, types of data, data reuse, technical tools, documentation and data quality control processes. Semi-structured interviews as well as a review of reported studies on French CDWs were conducted in a semi-structured manner from March to November 2022. Out of 32 regional and university hospitals in France, 14 have a CDW in production, 5 are experimenting, 5 have a prospective CDW project, 8 did not have any CDW project at the time of writing. The implementation of CDW in France dates from 2011 and accelerated in the late 2020. From this case study, we draw some general guidelines for CDWs. The actual orientation of CDWs towards research requires efforts in governance stabilization, standardization of data schema and development in data quality and data documentation. Particular attention must be paid to the sustainability of the warehouse teams and to the multi-level governance. The transparency of the studies and the tools of transformation of the data must improve to allow successful multi-centric data reuses as well as innovations in routine care.Comment: 16 page

    Utilisation de la syntaxe pour valider les réponses à des questions par plusieurs documents.

    Get PDF
    National audienceCet article présente FIDJI, un systÚme de questions-réponses pour le français, combinant des informations syntaxiques sur la question et les documents avec des techniques plus traditionnelles du domaine, telles que la reconnaissance des entités nommées et la pondération des termes. Nous expérimentons notament dans ce systÚme la validation des réponses dans plusieurs documents, ainsi que des techniques spécifiques permettant de répondre à différents types de questions (comme les questions attendant des réponses multiples (liste) ou une réponse booléenne)
    • 

    corecore